Scientific Python antipatterns advent calendar day thirteen

For today, a brief example since it’s the weekend!As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Manually merging/zipping lists

A fairly common task, especially in data-heavy scientific programming, is to take two collections and process them in pairs. This is easier to explain with an example:

fruits = [
    "apple","banana","orange",
    "elderberry","grape",
    "olive", # technically a fruit!
]

colours = [
    'red', 'yellow', 'orange',
    'purple', 'green',
    'olive' # technically a colour too!
]

Let’s say we want to match up each fruit with its colour. Iterating over one of the lists is easy:

for fruit in fruits:
    print(fruit)
apple
banana
orange
elderberry
grape
olive

but then it’s not obvious how to get the matching element from the second list. index sometimes works:

for fruit in fruits:
    index  = fruits.index(fruit)
    colour = colours[index]
    print(fruit, colour)
apple red
banana yellow
orange orange
elderberry purple
grape green
olive olive

but only as long as the elements in the first list are unique. range is slightly better:

for index in range(len(fruits)):
    fruit = fruits[index]
    colour = colours[index]
    print(fruit, colour)
apple red
banana yellow
orange orange
elderberry purple
grape green
olive olive

but there’s a built in Python tool for handling this situation: zip. The job of zip is to take two lists and return elements paired up:

list(zip(fruits, colours))
[('apple', 'red'),
 ('banana', 'yellow'),
 ('orange', 'orange'),
 ('elderberry', 'purple'),
 ('grape', 'green'),
 ('olive', 'olive')]

zip is lazy, so in the above code we need to wrap it in list to see the pairs, but when we put it in a loop we don’t need list:

for pair in zip(fruits, colours):
    print(pair)
('apple', 'red')
('banana', 'yellow')
('orange', 'orange')
('elderberry', 'purple')
('grape', 'green')
('olive', 'olive')

We can split up each pair inside the loop:

for pair in zip(fruits, colours):
    fruit = pair[0]
    colour = pair[1]
    print(fruit, colour)
apple red
banana yellow
orange orange
elderberry purple
grape green
olive olive

but it’s easier and more Pythonic to do it in the for line:

for fruit, colour in zip(fruits, colours):
    print(fruit, colour)
apple red
banana yellow
orange orange
elderberry purple
grape green
olive olive

This structure allows us to let zip take care of matching up the elements by position, and lets us concentrate on the logic that we want inside the loop:

# print fruits that have colours named after them
for fruit, colour in zip(fruits, colours):
    if colour == fruit:
        print(fruit)
orange
olive

Bonus: zip isn’t limited to two lists - it will happily work with any number of lists:

sizes = [
    "large","large","large",
    "small","medium",
    "medium", # technically a fruit!
]

for fruit, colour, size in zip(fruits, colours, sizes):
    if colour == fruit:
        print(fruit, size)
orange large
olive medium

Also bonus: if the logic that we want inside the loop is to construct a dictionary:

fruit_colours = {}
for fruit, colour in zip(fruits, colours):
    fruit_colours[fruit] = colour
fruit_colours
{'apple': 'red',
 'banana': 'yellow',
 'orange': 'orange',
 'elderberry': 'purple',
 'grape': 'green',
 'olive': 'olive'}

We can take a nice shortcut and just pass the zip directly to the dict function:

dict(zip(fruits, colour))
{'apple': 'o', 'banana': 'l', 'orange': 'i', 'elderberry': 'v', 'grape': 'e'}

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list